18  Making good figures

Learning Objectives

After completing this tutorial you should be able to

Download the directory for this project here, make sure the directory is unzipped and move it to your bi328 directory. You can open the Rproj for this module either by double clicking on it which will launch Rstudio or by opening Rstudio and then using File > Open Project or by clicking on the Rproject icon in the top right of your program window and selecting Open Project.

There should be a file named 18_visualizations.qmd in that project directory. Use that file to work through this tutorial - you will hand in your rendered (“knitted”) quarto file as your homework assignment. So, first thing in the YAML header, change the author to your name. You will use this quarto document to record your answers. Remember to use comments to annotate your code; at minimum you should have one comment per code set1 you may of course add as many comments as you need to be able to recall what you did. Similarly, take notes in the document as we discuss discussion/reflection questions but make sure that you go back and clean them up for “public consumption”.

  • 1 You should do this whether you are adding code yourself or using code from our manual, even if it isn’t commented in the manual… especially when the code is already included for you, add comments to describe how the function works/what it does as we introduce it during the participatory coding session so you can refer back to it.

  • Before you get started, let’s make sure to read in all the packages we will need.

    # load libraries
    
    # reporting
    library(knitr)
    
    # visualization
    library(ggplot2)
    library(ggthemes)
    library(patchwork)
    
    # data wrangling
    library(dplyr)
    library(tidyr)
    library(readr)
    library(skimr)
    library(janitor)
    library(magrittr)
    
    # turn off scientific notation
    options(scipen=999)

    18.1 Best practices for effective figures

    Effective figures are used to tell a story. You can think of your exploratory analysis as the first step in determining what story the data tells (a fact finding mission if you will). Once you have determined what your specific question hypothesis is, the next step is narrowing down what analysis you will perform. Then you will identify your central results and interpret them in the context of your question. Your next step is communicating your results. At this point you need to identify which figures serve to communicate individual, specific points in the overall narrative.

    In our example, we have refined our question to “Have impacts due to natural disasters increased at a global scale?”. To answer this question, we are calculating the total annual loss of life and annual total economic damages incurred for each disaster type.

    disaster <- read_delim("data/nat-disasters_emdat-query-2021-10-05.txt", delim = "\t", skip = 6) %>%
      clean_names()
    
    total_yr <- disaster %>%
      filter(!is.na(total_damages_000_us)) %>%
      group_by(year,disaster_type) %>%
      summarize(total_damages_yr = sum(total_damages_000_us))
    
    deaths <- disaster %>%
      filter(!is.na(total_deaths)) %>%
      group_by(year, disaster_type) %>%
      summarize(total_deaths_yr = sum(total_deaths)) 

    Then we created bubble plot showing the total loss of life and total damages incurred for each disaster type in each year.

    Impacts due to natural disasters (1900 - 2020). Impacts were calculated as total annual economic costs (top) and total annual loss of life (bottom) due to each disaster type per year. The size of the bubble is scaled to cost [$] and total deaths [100k]. Data was obtained from EMDAT/International Disaster Database
    Consider this

    Summarize the overall narrative for the “story” we want to communicate based on our central question and state the specific point that each of these figures makes.

    Did it!

    [Your answer here]

    Consider this

    Data visualizations should be clear, accessible, transparent, and honest. For each of these categories list at least three criteria/guidelines that detail how data visualizations fulfill this category. Assess whether our figure(s) above fulfill these criteria and list at least three things that could be improved based this set of criteria.

    Did it!

    [Your answer here]

    18.2 Best practices for communicating results

    Visualizations are a key component to effectively communicating your results. A good rule of thumb is that your written description of your results should allow the reader to understand what you are trying to communicate even without visualizations and that your title & legend should be descriptive enough that even without the written results (some would even say methods) the reader would understand the results as well.

    Consider this

    Give a description of what the components of an effective figure title & legend should look like and assess whether the figure legend above fulfills these criteria. List points for improvement.

    Did it!

    [Your answer here]

    Consider this

    Give a description of what an effective description of results should look like and write a description for the figure(s) above.

    Did it!

    [Your answer here]

    Consider this

    Give a description of the components of an effective discussion and write a brief discussion of the results of our example above (Refer back to the end of chapter 16 for this …you can use your answer to the last question as a starting point).

    Did it!

    [Your answer here]

    18.3 Practice the best practices!

    Consider this

    Use one of the questions you explored for chapter 17 and complete the following tasks:

    • State the overall broad question you are asking
    • Summarize your “methods” in 2-3 sentences: How are you transforming/analyzing the data to specifically answer your question?
    • Summarize the narrative of the story you want to communicate with this analysis and specify how the visualization serves the overall narrative.
    • Generate your figure. Use bullet points to assess whether your visualization is clear, accessible, transparent, and honest.
    • Write an effective title/legend for your figure.
    • Write an effective description of your results.
    • Write an effective interpretation discussion/interpretation of your results.
    Did it!

    [Your answer here]

    18.4 Spatial visualization

    You have probably noticed that for some of the geographic comparisons were are making the most effective way to communicate those results would be to create a map with countries color coded according to the metric we are assessing.

    Let’s learn a straightforward method for visualization of spatial data. Run each of the code chunks below and comment the code line by line to describe what each function/argument is doing.

    Install these packages as needed:

    install.packages("rnaturalearth")
    
    install.packages("rnaturalearthdata")
    
    install.packages("sf")

    Now we can load these libraries.

    library(rnaturalearth)
    library(rnaturalearthdata)
    library(sf)

    The first thing we need is is some maps! The package rnaturalearth provides a map of countries of the entire world. Let’s load a map as an sf class.

    world <- ne_countries(scale = "medium", returnclass = "sf")
    
    class(world)
    [1] "sf"         "data.frame"

    We see that world is both a data.frame and a class sf(simple features) which is a class of R object designed specifically for plotting maps.

    Let’s take a look at the column names

    colnames(world)
     [1] "scalerank"  "featurecla" "labelrank"  "sovereignt" "sov_a3"    
     [6] "adm0_dif"   "level"      "type"       "admin"      "adm0_a3"   
    [11] "geou_dif"   "geounit"    "gu_a3"      "su_dif"     "subunit"   
    [16] "su_a3"      "brk_diff"   "name"       "name_long"  "brk_a3"    
    [21] "brk_name"   "brk_group"  "abbrev"     "postal"     "formal_en" 
    [26] "formal_fr"  "note_adm0"  "note_brk"   "name_sort"  "name_alt"  
    [31] "mapcolor7"  "mapcolor8"  "mapcolor9"  "mapcolor13" "pop_est"   
    [36] "gdp_md_est" "pop_year"   "lastcensus" "gdp_year"   "economy"   
    [41] "income_grp" "wikipedia"  "fips_10"    "iso_a2"     "iso_a3"    
    [46] "iso_n3"     "un_a3"      "wb_a2"      "wb_a3"      "woe_id"    
    [51] "adm0_a3_is" "adm0_a3_us" "adm0_a3_un" "adm0_a3_wb" "continent" 
    [56] "region_un"  "subregion"  "region_wb"  "name_len"   "long_len"  
    [61] "abbrev_len" "tiny"       "homepart"   "geometry"  

    You can see that it contains quote a bit of information including some data about each country. The column geometry has the information about the country “shapes” (multipolygons) we want to plot.

    world %>%
      select(geometry) %>%
      head()
    Simple feature collection with 6 features and 0 fields
    Geometry type: MULTIPOLYGON
    Dimension:     XY
    Bounding box:  xmin: -70.06611 ymin: -18.01973 xmax: 74.89131 ymax: 60.40581
    Geodetic CRS:  +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0
                            geometry
    1 MULTIPOLYGON (((-69.89912 1...
    2 MULTIPOLYGON (((74.89131 37...
    3 MULTIPOLYGON (((14.19082 -5...
    4 MULTIPOLYGON (((-63.00122 1...
    5 MULTIPOLYGON (((20.06396 42...
    6 MULTIPOLYGON (((20.61133 60...

    We can plot the map using ggplot::geom_sf() and the same syntax we generally use. We do not need to specify x or y coordinates as ggplot recognizes that we have passed a object of the class sf and that it contains a column called geometry.

    ggplot(data = world) +
      geom_sf()

    We can manipulate the geom_sf layer using the same arguments we have used for other plot types. For example, let’s make all the countries orange.

    ggplot(data = world) + 
        geom_sf(color = "black", fill = "orange") 

    Or, we could color code them according to the column pop_est in the world data.frame which contains population estimates for each country.

    ggplot(data = world) +
        geom_sf(aes(fill = pop_est)) +
        scale_fill_viridis_c(trans = "sqrt") +
      coord_sf() 

    Let’s say we want to create a map that compares the number of droughts that occurred in each country from 2010 - 2020. First we would need to transform our raw data.

    droughts <- disaster %>%
      filter(year >= 2010 & year <= 2020) %>%
      group_by(iso) %>%
      summarize(droughts_total = n())

    Now we can use left_join() to join the world and drought data.frames. Note that I used the column iso to calculate the number of droughts, this is equivalent to the column iso_a3 in the world database. These are internationally recognized codes that designate every country and most independent areas with either a two or in this case three-letter abbreviations.

    world <- world %>%
      left_join(droughts, by = c("iso_a3" = "iso"))
    
    head(world)
    Simple feature collection with 6 features and 64 fields
    Geometry type: MULTIPOLYGON
    Dimension:     XY
    Bounding box:  xmin: -70.06611 ymin: -18.01973 xmax: 74.89131 ymax: 60.40581
    Geodetic CRS:  +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0
      scalerank      featurecla labelrank     sovereignt sov_a3 adm0_dif level
    1         3 Admin-0 country         5    Netherlands    NL1        1     2
    2         1 Admin-0 country         3    Afghanistan    AFG        0     2
    3         1 Admin-0 country         3         Angola    AGO        0     2
    4         1 Admin-0 country         6 United Kingdom    GB1        1     2
    5         1 Admin-0 country         6        Albania    ALB        0     2
    6         3 Admin-0 country         6        Finland    FI1        1     2
                   type       admin adm0_a3 geou_dif     geounit gu_a3 su_dif
    1           Country       Aruba     ABW        0       Aruba   ABW      0
    2 Sovereign country Afghanistan     AFG        0 Afghanistan   AFG      0
    3 Sovereign country      Angola     AGO        0      Angola   AGO      0
    4        Dependency    Anguilla     AIA        0    Anguilla   AIA      0
    5 Sovereign country     Albania     ALB        0     Albania   ALB      0
    6           Country       Aland     ALD        0       Aland   ALD      0
          subunit su_a3 brk_diff        name     name_long brk_a3    brk_name
    1       Aruba   ABW        0       Aruba         Aruba    ABW       Aruba
    2 Afghanistan   AFG        0 Afghanistan   Afghanistan    AFG Afghanistan
    3      Angola   AGO        0      Angola        Angola    AGO      Angola
    4    Anguilla   AIA        0    Anguilla      Anguilla    AIA    Anguilla
    5     Albania   ALB        0     Albania       Albania    ALB     Albania
    6       Aland   ALD        0       Aland Aland Islands    ALD       Aland
      brk_group abbrev postal                    formal_en formal_fr note_adm0
    1      <NA>  Aruba     AW                        Aruba      <NA>     Neth.
    2      <NA>   Afg.     AF Islamic State of Afghanistan      <NA>      <NA>
    3      <NA>   Ang.     AO  People's Republic of Angola      <NA>      <NA>
    4      <NA>   Ang.     AI                         <NA>      <NA>      U.K.
    5      <NA>   Alb.     AL          Republic of Albania      <NA>      <NA>
    6      <NA>  Aland     AI                Åland Islands      <NA>      Fin.
      note_brk   name_sort name_alt mapcolor7 mapcolor8 mapcolor9 mapcolor13
    1     <NA>       Aruba     <NA>         4         2         2          9
    2     <NA> Afghanistan     <NA>         5         6         8          7
    3     <NA>      Angola     <NA>         3         2         6          1
    4     <NA>    Anguilla     <NA>         6         6         6          3
    5     <NA>     Albania     <NA>         1         4         1          6
    6     <NA>       Aland     <NA>         4         1         4          6
       pop_est gdp_md_est pop_year lastcensus gdp_year                    economy
    1   103065     2258.0       NA       2010       NA       6. Developing region
    2 28400000    22270.0       NA       1979       NA  7. Least developed region
    3 12799293   110300.0       NA       1970       NA  7. Least developed region
    4    14436      108.9       NA         NA       NA       6. Developing region
    5  3639453    21810.0       NA       2001       NA       6. Developing region
    6    27153     1563.0       NA         NA       NA 2. Developed region: nonG7
                   income_grp wikipedia fips_10 iso_a2 iso_a3 iso_n3 un_a3 wb_a2
    1 2. High income: nonOECD        NA    <NA>     AW    ABW    533   533    AW
    2           5. Low income        NA    <NA>     AF    AFG    004   004    AF
    3  3. Upper middle income        NA    <NA>     AO    AGO    024   024    AO
    4  3. Upper middle income        NA    <NA>     AI    AIA    660   660  <NA>
    5  4. Lower middle income        NA    <NA>     AL    ALB    008   008    AL
    6    1. High income: OECD        NA    <NA>     AX    ALA    248   248  <NA>
      wb_a3 woe_id adm0_a3_is adm0_a3_us adm0_a3_un adm0_a3_wb     continent
    1   ABW     NA        ABW        ABW         NA         NA North America
    2   AFG     NA        AFG        AFG         NA         NA          Asia
    3   AGO     NA        AGO        AGO         NA         NA        Africa
    4  <NA>     NA        AIA        AIA         NA         NA North America
    5   ALB     NA        ALB        ALB         NA         NA        Europe
    6  <NA>     NA        ALA        ALD         NA         NA        Europe
      region_un       subregion                 region_wb name_len long_len
    1  Americas       Caribbean Latin America & Caribbean        5        5
    2      Asia   Southern Asia                South Asia       11       11
    3    Africa   Middle Africa        Sub-Saharan Africa        6        6
    4  Americas       Caribbean Latin America & Caribbean        8        8
    5    Europe Southern Europe     Europe & Central Asia        7        7
    6    Europe Northern Europe     Europe & Central Asia        5       13
      abbrev_len tiny homepart droughts_total                       geometry
    1          5    4       NA             NA MULTIPOLYGON (((-69.89912 1...
    2          4   NA        1             65 MULTIPOLYGON (((74.89131 37...
    3          4   NA        1             26 MULTIPOLYGON (((14.19082 -5...
    4          4   NA       NA              1 MULTIPOLYGON (((-63.00122 1...
    5          4   NA        1             12 MULTIPOLYGON (((20.06396 42...
    6          5    5       NA             NA MULTIPOLYGON (((20.61133 60...

    Now our world data.frame has a column specifying the number of droughts that occurred.

    ggplot(data = world) +
        geom_sf(aes(fill = droughts_total)) +
        scale_fill_viridis_c(trans = "sqrt") +
        coord_sf(expand = FALSE) +
      labs(x = "longitude", y = "latitude") +
      theme_bw() +
      theme(legend.position = "bottom")

    Consider this

    Use the figure we just generated and go through our steps for creating good visualizations according to our best practices:

    • State the overall broad question you are asking
    • Summarize your “methods” in 2-3 sentences: How are you transforming/analyzing the data to specifically answer your question?
    • Summarize the narrative of the story you want to communicate with this analysis and specify how the visualization serves the overall narrative.
    • Generate your figure - use the code above but optimize the figure as needed. Use bullet points to assess whether your visualization is clear, accessible, transparent, and honest.
    • Write an effective title/legend for your figure.
    • Write an effective description of your results.
    • Write an effective interpretation discussion/interpretation of your results.
    Did it!

    [Your answer here]